Substructural cooperativity and parallel versus sequential events 

during protein unfolding 
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According to the 'old view', proteins fold along well-defined sequential pathways, whereas the 
'new view' sees protein folding as a highly parallel stochastic process on funnel-shaped energy 
landscapes. We have analyzed parallel and sequential processes on a large number of Molecular 
Dynamics unfolding trajectories of the protein CI2 at high temperatures. Using rigorous statistical 
measures, we quantify the degree of sequentiality on two structural levels. The unfolding process 
is highly parallel on the microstructural level of individual contacts. On a coarser, macrostructural 
level of contact clusters, characteristic parallel and sequential events emerge. These characteristic 
events can be understood from loop-closure dependencies between the contact clusters. A correlation 
analysis of the unfolding times of the contacts reveals a high degree of substructural cooperativity 
within the contact clusters. 
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Introduction 

How do proteins fold into their native structure? In 
1968, Levinthal suggested that proteins are guided along 
sequential pathways into the native structure, since an 
unguided search of the vast conformational space seemed 
incompatible with fast and efficient folding About a 
decade ago, a 'new view' emerged in which folding is 
seen as a parallel process on funnel-shaped landscapes, 
inspired by simple statistical-mechanical models (for re- 
views, see klEl). The bias of the funnel landscapes to- 
wards the native protein structure ensures efficient fold- 
ing along a multitude of routes. An intriguing question 
is whether the apparently contradictory 'old view' of se- 
quential folding and 'new view' of parallel folding can be 
reconciled 0001 

We explore here parallel and sequential events during 
unfolding of CI2. The protein CI2 is a central model 
system for folding, because of its prominent role as first 
protein for which two-state kinetics has been observed 
and an extensive mutational analysis of the kinetics d 
The folding kinetics of CI2 has been investigated theoret- 
ically both with atomistic (j|T5lilElEllllillJll[ll 
and simplified statistical-mechanical models |19l |2(1 l2ll 
I22I I23L l24l I25I ] . Since atomistic folding simulations are 
still limited to small or ultrafast folding proteins [2(1 |27l 
El Ell!! EJI32, MD unfolding simulations at elevated 
temperatures are an important tool to study the kinetics 

Daggett and coworkers have used MD unfolding simula- 
tions of CI2 to characterize the transition-state ensemble 
[I5EE BB IIi| Lazaridis and Karplus @ and Fer- 
rara et al. [la.ll7| have extracted characteristic sequences 
of events from average unfolding times of contacts in MD 
simulations. 

In this article, we quantify the degree of sequential- 
ity during unfolding for each pair of native contacts, and 
each pair of contact clusters. The contact clusters in the 
native contact map of CI2 correspond to the four main 
substructural elements: the a-helix, and the three strand 
pairings 0x04, 0203, and 0304 (see Figs. ^ and We 



consider the sequences of unfolding events of contacts and 
contact clusters on each trajectory. The unfolding of a 
pair of contacts or contact cluster is defined as sequential 
if the same sequence of events is observed on essentially 
all trajectories. The pairwise unfolding is parallel if con- 
tact (or contact cluster) A unfolds prior to contact (or 
contact cluster) B on some of the trajectories, and later 
than B on other trajectories. 

We find characteristic parallel and sequential unfolding 
events of the contact clusters. The three contact clusters 

0203 and 0304 unfold essentially parallel to each other, 
but sequentially after the contact cluster 0i0x- This un- 
folding scenario is in agreement with a simple folding 
model based on the loop-closure dependencies between 
the contact clusters [42l 143) . In the model, the entropic 
loop-closure cost for forming the highly nonlocal contact 
cluster 0x04 is reduced by the previous formation of the 
other three contact clusters. This leads to the prediction 
that 0i04 folds after the other three clusters, i.e. in re- 
verse sequence to our MD unfolding scenario. In the sim- 
ple model, the three contact clusters a, 0203 and 0304 
fold in parallel since the loop-closure cost for forming 
these cluster does not depend on the sequence in which 
they are formed. 




FIG. 1: Structure of the protein CI2 0. 
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FIG. 2: Native contacts and contact clusters of CI2. Black 
dots represent contacts between pairs of amino acids that 
where present for at least 75% of the simulation time on an 
exemplary MD-trajectory at 300 K starting from the crystal 
structure. Two amino acids here are defined to be in contact 
if the distance between their C a or atoms is less than 6A. 

These characteristic unfolding events are also reflected 
in the pairwise sequencing of the contacts. However, on 
the level of individual contacts, the parallelity of unfold- 
ing events is clearly increased. Our results thus illus- 
trate that a highly parallel picture of folding is obtained 
on microstructural levels [3j , here the level of individual 
contacts. An obvious reason for this increase is that an 
unfolding trajectory on the contact level is specified by 
an opening sequence of 69 contacts. On the cluster level, 
trajectories are specified by a sequence of only four ele- 
ments, the contact clusters. Another reason is that paral- 
lel events occur also within clusters on the contact level, 
not only between clusters. A pairwise correlation analysis 
of the unfolding times of the contacts shows a high degree 
of correlation within the contact clusters. The contact 
clusters thus represent cooperative substructures. 

Model and methods 

Molecular dynamics simulations 

We have performed MD simulations with the 
CHARMM EFF1 force neldjll E3- EEF1 is a force 
field with implicit solvent [42j and has been previously 
used by sever al g roups to study the unfolding kinetics 
of proteins |ol I38L l39t liof . including the protein CI2 5]. 
After minimization of the CI2 crystal structure (protein 
data bank code: 2ci2) by 1000 steepest-descent and 1000 
adopted-Newton-Raphson minimization steps and after 
5 nanoseconds (ns) of equilibration, we have performed 
a 100 ns simulation at the temperature 300 K. The aver- 
age root-mean-square deviation (RMSD) of the C a atoms 
in the simulation with respect to the crystal structure 



was 2.0 A, see Fig. [3J We took 50 conformations from 
this trajectory as starting conformations for the thermal 
unfolding simulations at 400 K, 450 K and 500 K. The 
length of the individual unfolding simulations depended 
on the vanishing of the native contacts and was about 
100 ns at 400 K, 10 ns at 450 K, and 1 ns at 500K. We 
have performed 30 unfolding simulations at 400 K, and 
50 unfolding simulations at 450 and 500 K. 



Unfolding times and sequences 

To define the unfolding times for a specific contact on a 
trajectory at 400 K, we consider time intervals of length 
150 ps and determine the probability that the contact is 
formed during this interval. The unfolding time of this 
contact is defined as the time at which the probability 
first falls below the threshold value 0.05. In other words, 
the unfolding time of a contact is defined as the midpoint 
of the first 150 ps interval during which the contact was 
only present 5% of the time. For the trajectories at 450 
and 500 K, we use shorter time intervals of length 54 
ps and 10.5 ps, respectively, to define contact unfolding 
times. We consider here as native contacts all contacts 
that were present during at least 75% of an exemplary 
trajectory at 300 K (see Fig- |2I> ■ I n a given conformation, 
two residues were taken to be in contact if the distance 
between their C a or atoms was less than 6 A. [5^| 

Besides contacts, we consider here contact clusters as 
coarser structural level. The four contact clusters of 
the protein CI2 correspond to the ev-helix and the three 
strand pairings 0203, and /3s/34 (see Fig. For 

each of the clusters, we determine the fraction of clus- 
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FIG. 3: Root-mean-square deviation (RMSD) of the C Q 
atoms with respect to the crystal structure along a 100 
nanosecond MD-trajectory at the temperature 300 K. The 
average value of 2.0 A for the C Q -RMSD indicates that the 
protein CI2 is stable in the CHARMM force field EEF1 at 
300 K on this timescale. The RMSD values here are averaged 
over time intervals of 250 ps to integrate out small-time-scale 
fluctuations. During unfolding at high temperatures between 
400 and 500 K, the C a -RMSD attains values of 12 A and 
higher (data not shown). 
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FIG. 4: Fraction of cluster contacts on an exemplary unfold- 
ing trajectory at the temperature 450 K. The four contact 
clusters a, Pif3i, P2P3, and /34/3s of CI2 are defined in Fig. [5] 
The contact fractions are averaged over time intervals of 50 
ps to integrate out small-time-scale fluctuations. The three 
dashed horizontal lines represent thresholds used to define 
the unfolding sequence of the contact cluster. In this exam- 
ple, /3i/34 is defined to unfold first because its contact fraction 
crosses all threshold lines prior to the contact fractions of the 
other clusters. By the same definition, the cluster ^2^3 here 
unfolds last. In this example, the clusters a and ^3^4 un- 
fold 'simultaneously' since a crosses the threshold lines at the 
contact fraction 0.15 and 0.1 earlier than P3P4,, but threshold 
line at 0.05 later. 

ter contacts formed during a trajectory (see Fig. 0. To 
define the unfolding sequence of clusters on a trajectory, 
we consider several threshold values for the fraction of 
cluster contacts. If two clusters unfold more or less si- 
multaneously, the sequence in which they cross different 
threshold values can vary. We define a cluster to unfold 
before another cluster if it crosses all threshold values 
before that cluster. We have considered here 7 threshold 
values between 0.05 and 0.2, in intervals of 0.025. 



Results and discussion 

Parallel and sequential unfolding of contact clusters 

A statistical analysis of the unfolding events on the 
contact cluster level is presented in Fig. |5| The numbers 
indicate the fractions of trajectories on which a given 
cluster unfolds prior to another cluster. At the tempera- 
ture 400 K, for example, /3i/?4 unfolds prior to a on 83% 
of the trajectories, and a unfolds before fiifi^ on 7% of 
the trajectories. On the remaining 10% of trajectories, 
the two clusters unfold simultaneously, i.e. without clear 
sequence. 

At all three temperatures, the cluster Pifii unfolds 
with high probability prior to the other clusters (the 
numbers in the second row of the matrices are between 
0.83 and 1), and unfolds with low probability after the 



T=400 K unfolds second 





a 


P1P4 


P2P3 


P3P4 


a 




0.07 


0.87 


0.43 


P1P4 


0.83 




1.00 


0.90 


P2P3 


0.00 


0.00 




0.10 


P3P4 


0.20 


0.00 


0.77 
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T=500 K unfolds second 
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FIG. 5: Unfolding statistics for the contact clusters of CI2 at 
the temperatures 400 K, 450 K, and 500 K. The numbers rep- 
resent the fraction of unfolding trajectories on which contact 
cluster x unfolds prior to contact cluster y at all considered 
thresholds (see Fig. |1J. At 400 K, for example, the cluster 
I3\j3i opens prior to a on 83% of the trajectories, and a opens 
prior to /3if3i on 7% of the trajectories. On the remaining 
10% of trajectories, the two clusters unfold 'simultaneously', 
i.e. the sequence of unfolding events depends on the consid- 
ered threshold. 



other clusters (the numbers in the second column are be- 
tween and 0.07). The unfolding of /3i/?4 with respect 
to each of the other three clusters thus is sequential. In 
contrast, the two clusters a and /33/?4 unfold in paral- 
lel. At the temperatures 400 K and 450 K, unfolds 
prior to a with probabilities around 0.2, and after a with 
probabilities slightly larger than 0.4. On the remaining 
close to 40% of trajectories, the two clusters unfold si- 
multaneously. At these temperatures, the unfolding of 
the two clusters is parallel with a 2 to 1 preference for a 
unfolding prior to flipi on the trajectories where the two 
clusters do not unfold simultaneously. At the tempera- 
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FIG. 6: Unfolding statistics for native cluster contacts of CI2 
at 400 K, 450 K, and 500 K. The colors represent the fraction 
of unfolding trajectories on which contact x unfolds prior to 
contact y. For the cluster a, the contacts are arranged in 
the order 12/16, 13/16, 14/17, 15/18, 16/19, 17/20, 18/21, 
18/22, 19/22, 20/23, 21/24, 21/15, 22/25 ('N to C termi- 
nus'). For /3i/?4, the contacts are arranged as 14/56, 13/56, 
13/57, 12/57, 11/57, 12/58, 11/58, 10/58, 9/58, 10/61, 9/62, 
7/61, 6/62, 5/62, 5/63, 4/64 (increasing contact order). For 
P2P3, the order of contacts is 28/46, 28/48, 29/46, 29/47, 
30/48, 31/49, 32/50, 32/52, 33/49, 33/50, 33/51, 33/52, 
34/52, 34/54, 35/51, 35/52, 35/53. For /3 3 /3 4 , the contacts 
are arranged in the order 52/56, 53/57, 52/57, 52/58, 53/59, 
51/58, 52/59, 50/58, 51/60, 50/62, 49/62, 49/63, 48/64, 
47/64, 47/65, 45/64 (increasing contact order). 

ture 500 K, the unfolding of a and is parallel with 
reversed preferences. 

We observe a more pronounced temperature depen- 
dence for the unfolding sequences of (3203 and a. The 
cluster unfolds sequentially after a at 400 K, and 

parallel to a at 500 K. Similarly, the unfolding of P2P3 
and /?3/?4 has a stronger parallel character at the higher 
temperatures 450 K and 500 K compared to 400 K, with 
a preference for P3P4 opening first. 

The overall picture emerging from these statistics is: 

(1) /?i/?4 unfolds prior to the other three clusters, and 

(2) the three clusters a, /?2/?3 and /?3/?4 unfold predomi- 
nantly parallel to each other, with increasing parallelity 
at higher temperatures. This picture is in agreement with 
a simple folding model based on the loop-closure depen- 
dencies between the clusters In this model, the 
length of the loop that has to be closed to form a contact 
cluster is estimated via the graph-theoretical concept of 
effective contact order (ECO) HHH. The EC0 is the 
length of the shortest path between two residues that are 
forming a contact. The elementary steps along this path 
either are covalent contacts between neighboring residues 
in the chain, or previously formed noncovalent contacts 
between residues. In the case of the protein CI2, the 
ECO of the highly nonlocal cluster P1P4 is reduced by the 
previous formation of the other three clusters, whereas 
the ECOs of these three clusters do not depend on the 
formation of other clusters. On the dominant folding 
route with smallest loop-closure cost, /3i/?4 is therefore 
predicted to form after the other three clusters, which 
fold in parallel. This folding sequence is reversed com- 
pared to the MD unfolding sequence. However, we can 
not exclude that our MD unfolding sequence may also re- 
sult from on average weaker contact energies for the /3i/?4 
contact cluster, compared to the other three clusters. 

Unfolding sequences and correlations of individual contacts 



unfolds second 




The unfolding statistics of all pairs of cluster contacts 
is summarized in Fig. El The precise order in which the 
contacts of the four clusters are presented is specified in 



FIG. 7: Spearman correlation coefficients for the unfolding ^qq ^ 

times of contacts. High correlations between pairs of contacts 
are represented in black, low correlations in white. High cor- 
relations are observed predominantly between contacts of the 
same contact cluster. The contact clusters thus correspond 
to cooperative protein substructures. The contacts are pre- 
sented in the same order as in Fig. 5. - For the 30 trajectories 
at 400 K, a Spearman correlation coefficient of 0.43 has a p- 
value of 0.01, and a correlation coefficient 0.55 a p-value of 
0.001 For the 50 trajectories at 450 and 500 K, the cor- 
relation coefficients 0.33 and 0.43 have the p-values 0.01 and 
0.001, respectively. The p-value of a correlation coefficient is 
the probability that a similar or higher correlation is obtained 
by chance. The p-value is a measure for the significance of an 
observed correlation coefficient. Low p- values indicate high 
significance. 



the figure caption. Blue colors indicate high probabilities 
for unfolding sequencies, and red colors low probabilities. 
Green colors represent intermediate probabilities, which 450 K 

correspond to parallel events. At all three temperatures, 
the contacts of /3i/?4 unfold with high probabilities prior 
to the contacts of the other clusters. At the temperature 
400 K, the majority of contacts have a strong ten- 
dency to unfold after the contacts of the clusters a and 
/?3/?4. This tendency decreases with increasing tempera- 
ture. The unfolding statistics on the level of individual 
contacts thus reflects the parallel and sequential events 
on the cluster level. 

The correlations of the contact unfolding times are rep- 
resented in Fig. [3 The contacts are given in the same 
order as in Fig. Here, black indicates high correla- 
tions, and white low correlations. The correlations are 
quantified by the Spearman rank correlation coefficient. 
To calculate the Spearman coefficient, one has to con- 
sider the pairs of unfolding times (ai,6i), (02,62), • • •, 
(ajsr,bjsr) of two contacts A and B from all N trajecto- 
ries. The unfolding times at of contact A are then ranked 500 K 
according to their magnitude, and the unfolding times hi 
of contact B as well. The Spearman rank correlation 
coefficient is defined as 

N d 2 

r = l-6V— 4 r (1) 

N(N 2 - 1) V ; 

where di is the rank difference between a, and 6^. The 
Spearman rank correlation can attain values between —1 
and 1, with 1 representing perfect correlation, and —1 
perfect anticorrelation. A value of 1 is obtained if the 
smallest unfolding time of contact A is paired with the 
smallest unfolding time of contact B, the next-smallest 
unfolding time of A with the next-smallest unfolding time 
of B, etc. The rank difference di of all pairs of unfolding 
times then is zero. The Spearman correlation coefficient 
is a simple analogue of the Pearson correlation coefficient. 
The Spearman correlation is preferable here because it 
is less sensitive to outliers and, hence, is a more robust 
measure of correlation. 
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We obtain high correlations mostly between contacts 
of the same contact cluster. The correlations of these 
contacts are represented in the four sub-matrizes along 
the diagonal of the matrizes. The high correlations indi- 
cate a high degree of substructural cooperativity within 
contact clusters. For the cluster a, these correlations de- 
crease with increasing temperature. For the clusters 0104 
and ,03,04, the high correlations between contacts of the 
same cluster mostly appear in two 'sub-blocks'. The con- 
tacts of these clusters are ordered according to increasing 
contact order (see caption of Fig.[fjJ). The contact order of 
a contact between residues i and j simply is the sequence 
separation \i— j\. In the contact map shown in Fig. El the 
cluster /?i 04 has a small gap between the contacts 14/56 
to 9/58 with smaller contact order and the contacts 10/61 
to 4/64 with slightly larger contact order. A similar gap 
also appears in cluster fisfi^. The two sub-blocks in the 
correlations between (3\f3± contacts correspond to high 
correlations within each of the two groups of contacts. 
This is also the case for the two sub-blocks in the (3^Pa 
correlations. A comparison with Fig.^lalso reveals a ten- 
dency for 'zipping' in /?i04 and /?3/?4, i.e. contacts with 
higher contact order in these clusters have a tendency 
to unfold earlier than contacts with lower contact order. 
This can be seen from the dominance of blue colors be- 
low the diagonals and red colors above the diagonals of 
the sub-matrizes in Fig. that represent the unfolding 
statistics within the /?i/?4 and (5^(5^ cluster. 

Comparison with MD simulation results of other groups 

A statistical analysis of MD unfolding sequences of the 
protein CI2 has also been performed by Lazaridis and 
Karplus [3 and Ferrara et al. 0, 0|. Lazaridis and 
Karplus J5j have considered the average times for the 
last appearance of contacts in unfolding simulations of 
CI2 at the temperature 500 K. They found the smallest 
average times for contacts between (3i and ,04, the next- 
largest average times for contacts between 0% and ,04 and 
for contacts within the a-helix, and obtained the largest 
avera ge t imes for contacts between and (3%. Ferrara 
et al. [l3 have considered the average C a RMSDs of 
conformations for which groups of contacts disappeared 
first and appeared last. The C a RMSD with respect to 
the native state here served as progress variable for un- 
folding. Ferrara et al. found the smallest average RMSD 
values at disappearance, i.e. early unfolding, for the 0104 
and ,03,04 contact groups, followed by RMSD values for 
the ,02,03 contact groups, and obtained the largest aver- 
age RMSD values at disappearance of the contacts of the 
a-helix. The on average early unfolding of pi 04 observed 
by the two groups is in agreement with our results. 

However, our analysis can not be directly compared to 



sequences of average unfolding times. We identify on each 
trajectory the unfolding sequences of pairs of contacts or 
contact clusters, and subsequently estimate probabilities 
for particular sequences from the numbers of times these 
sequences appear among all trajectories. The purpose of 
this analysis is to determine characteristic parallel and 
sequential unfolding events. Average unfolding times do 
not reveal this information. For example, a larger av- 
erage unfolding time for contact A than for contact B is 
observed if this contact unfolds after contact B on all tra- 
jectories (sequential unfolding), but can also be obtained 
if contact A opens after contact B on some trajectories, 
and prior to contact B on other trajectories (parallel un- 
folding) . 

In our analysis, we have focused on parallel and sequen- 
tial events, and have not considered transition states for 
unfolding. The reason is that the unfolding scenario at 
the high temperatures considered here is not a two-state 
scenario, but rather resembles a 'downhill-unfolding' sce- 
nario. In such a scenario, the initial state of the simula- 
tion, the folded state, is instable rather than metastable, 
see Fig. QJ The unfolding process is then downhill in free 
energy and does not involve the crossing of a significant 
transition-state barrier. Putative transition state struc- 
tures have been extracted from high-temperature simu- 
lations with a conformational clustering method [TTL Il4j . 
At lower temperatures, two-state folding and unfolding 
has been observed in MD simulations of peptides and 
small mini-proteins [H |H El H El HI • 



Conclusions 

We have quantified the degree of scqucntiality for pairs 
of contacts and contact clusters during thermal unfold- 
ing of CI2. On the level of contact clusters, the charac- 
teristic sequential event is the unfolding of 0104 prior 
to the clusters a, and ,03/34. The unfolding of 

these other three clusters is predominantly parallel. This 
unfolding scenario is also reflected on the contact level. 
The structural level of individual contacts is comparable 
to the microstate level of simpler statistical-mechanical 
models for protein folding. On this level, the unfolding 
process is highly parallel because of the large number of 
viable unfolding sequences of the 69 contacts. A cor- 
relation analysis of the unfolding times of the contacts 
reveals high correlations predominantly within contact 
clusters. The contact clusters thus correspond to cooper- 
ative protein substructures. Experimentally, cooperative 
substructures have been recently observed during cold 
unfolding of the protein Ubiquitin |49J and in equilibrium 
and kinetic hydrogen exchange studies of Cytochrome C 
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